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X-chromosome inactivation results in dosage equivalence between the X chromosome in males and females; 
however, over 15% of human X-linked genes escape silencing and these genes are enriched on the evolutionarily 
younger short arm of theX chromosome. The spread of inactivation onto translocated autosomal material allows 
the study of inactivation without the confounding evolutionary history of the X chromosome. The heterogeneity 
and reduced extent of silencing on autosomes are evidence for the importance of DNA elements underlying the 
spread of silencing. We have assessed DNA methylation in six unbalanced X-autosome translocations using the 
llluminalnfinium HumanMethylation450 array. Two to 42% of translocated autosomal genes showed this mark of 
silencing, with the highest degree of inactivation observed for trisomic autosomal regions. Generally, the extent 
of silencing was greatest close to the translocation breakpoint; however, silencing was detected well over 
100 kb into the autosomal DNA. Alu elements were found to be enriched at autosomal genes that escaped 
from inactivation while L1 s were enriched at subject genes. In cells without the translocation, there was enrich- 
ment of heterochromatic features such as EZH2 and H3K27me3 for those genes that become silenced when 
translocated, suggesting that underlying chromatin structure predisposes genes towards silencing. 
Additionally, the analysis of topological domains indicated physical clustering of autosomal genes of 
common inactivation status. Overall, our analysis indicated a complex interaction between DNA sequence, chro- 
matin features and the three-dimensional structure of the chromosome. 



INTRODUCTION 

X-chromosome inactivation (XCI) occurs early in mammalian 
development to transcriptionally silence one of the X chromo- 
somes in females, and generally results in dosage compensation 
for X-linked genes between XY males and XX females. 
However, a surprising 15% of human genes continue to show 
substantial expression from the inactive X chromosome (Xi) 
and thus are said to escape from XCI (1). While some of these 
genes retain Y homologs and are dosage compensated, the re- 
mainder are candidates for sexually dimorphic phenotypes 
(reviewed in 2). In order to understand how genes can escape 



from the spread of facultative heterochromatin on the Xi, 
several groups have undertaken bioinformatic studies of the 
DNA sequences surrounding genes that escape from or are 
subject to XCI (3-6). However, as the frequency with which 
genes escape from XCI increases in regions of the X chromo- 
some that diverged more recently from the Y chromosome or 
were more recent additions to the X chromosome (7), evolution- 
ary hitch-hiking may confound the identification of DNA ele- 
ments involved in the spread of XCI. Strikingly, long 
interspersed nuclear elements 1 (LI) elements have been 
shown to be enriched in regions of genes subject to XCI, and 
are also enriched on autosomes that spread XCI effectively 
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when translocated onto the Xi (8), an approach that minimizes 
the evolutionary bias. 

In individuals with unbalanced X;autosome translocations 
[t(X;A)]s, it is generally the t(X;A) that is inactivated (9), with 
inactivation spreading into autosomal material attached to the 
Xi. The extent of autosomal silencing is variable, and to a 
lesser extent than typically observed on the X chromosome, 
leading Gartler and Riggs (10) to hypothesize that waystations, 
which act as booster elements to propagate the inactivation 
signal, are more frequent on the X chromosome than autosomes. 
Additional DNA elements are likely involved in determining 
which genes are subject to, or escape from, XCI. Notably, mul- 
tiple different single-copy X-linked integrations of a bacterial 
artificial chromosome containing the mouse escape gene 
KdmSc as well as flanking genes subject to XCI, recapitulated 
XCI at multiple locations on the X chromosome; suggesting 
that escape from XCI is an intrinsic feature of the local DNA se- 
quence (1 1). In contrast, studies examining the frequency of re- 
petitive elements on the X chromosome found that larger 
windows of DNA sequence are more accurate at predicting 
XCI status (4,6), suggesting that waystations may act at the 
level of large domains. Intriguingly a smaller proportion of 
X-linked genes escape from XCI in mouse than in humans 
(12), and in conserved escape regions the domain is larger in 
humans possibly due to the loss of the boundary element 
CCCTC-binding factor (CTCF) (13,14). However, a DNA insu- 
lator containing CTCF-binding sites was unable to protect a 
transgene from XCI (15), reinforcing that there is likely interplay 
among a combination of elements that favour the spread of XCI 
(waystations), ongoing expression from the Xi (escape ele- 
ments) and serve as boundaries to one or both of those elements. 
In order to identify candidate genomic regions for such 
sequences, we have undertaken an examination of the extent of 
inactivation on the autosomal portion of unbalanced t(X;A)s. 

The spread of inactivation into the autosomal portion of unba- 
lanced t(X;A)s has been shown functionally (16) and by reverse 
transcription polymerase chain reaction expression analyses of 
individual genes (17). In agreement with earlier replication- 
timing-based studies, it has been shown that inactivation is not 
contiguous across autosomes (18,19); however, there is not 
complete concordance between silencing and detectable late 
replication-timing (20). Of the other features of an Xi, a better 
correlation with inactivation has been observed for heterochro- 
matic histone modifications (21), while association of the non- 
coding XIST RNA, which is essential for establishment, but 
not maintenance, of XCI could be lacking from the autosomal 
portion of the unbalanced t(X;A) despite silencing and other 
marks of an Xi (22). DNA methylation (DNAm) at a limited 
number of autosomal genes showed good agreement with the in- 
activation status predicted based on expression (2 1 ,23). Overall, 
previous studies have in combination demonstrated silencing for 
approximately two-thirds of the ~ 70 autosomal genes examined 
(reviewed in 2). Studies of the spread of inactivation are further 
complicated by the selective pressure exerted on cells which 
contain t(X;A)s. When the autosomal portion of an unbalanced 
t(X;A) is disomic, extensive silencing will result in under- 
expression of inactivated genes possibly leading to cell death. 
Conversely, when the autosomal portion of an unbalanced 
t(X; A) is trisomic, cells are likely to be selected for when inacti- 
vation is more extensive as this will achieve a more typical 



disomic expression pattern, potentially minimizing the negative 
phenotype associated with the trisomy of that autosome (24). 
The discontinuous distribution, and combination of some, but 
not all, marks of XCI in unbalanced t(X; A)s suggests a compli- 
cated relationship between the underlying DNA sequence and 
the spread of silencing and maintenance of XCI. 

Increased DNAm in females relative to males at cytosine- 
guanine dinucleotide (CpG) islands, including both high (HC) 
and intermediate (IC) CpG density (25), associated with 
X-linked genes subject to XCI has been well documented for 
many individual genes, and more recently genomic methodolo- 
gies to assess DNAm chromosome-wide have supported the use 
of DNAm to predict the XCI status of genes (26-28). Genes that 
escape XCI show similarly low levels of DNAm in males and 
females, while genes subject to XCI show DNAm levels 
approaching 50%, presumably reflecting decreased DNAm on 
the active X chromosome (Xa) and increased DNAm on the 
Xi. Genes with CpG island promoters are often housekeeping 
genes that are ubiquitously expressed; however, the XCI-related 
DNAm differences are also observed at CpG island promoters 
when a gene is silenced in a tissue (28-31). DNAm is only 
one of a myriad of epigenetic marks associated with transcrip- 
tional repression or activity, and the Encyclopedia of DNA 
Elements (ENCODE) project has created a catalogue of epigen- 
etic marks in a variety of different cell types (32). While many of 
these epigenetic marks, like DNAm, correlate closely with the 
transcriptional status of a single promoter, there are also long- 
range interactions that impact gene expression. Numerous 
studies have provided evidence for various extents of large-scale 
chromosomal domains and structures (33-35). The metaphase 
chromosome banding patterns reflect domains that are on 
average several megabases in size (reviewed in 36) with 
G-positive bands being known to have fewer CpG islands than 
G-negative bands (37) and to be later-replicating (38). Analysis 
of three-dimensional (3D) chromatin structure with Hi-C experi- 
ments has identified specific sections of DNA that are consistent- 
ly in close physical proximity within the nucleus across different 
cell types (39). The formation of facultative heterochromatin 
from one of a pair of essentially identical X chromosomes pro- 
vides a fascinating biological process in which to explore the 
interactions between DNA sequence, chromatin structure and 
the resulting 3D structure within the nucleus. 

We used the Illumina Infinium HumanMethylation450 array, 
which has probes for 99% of RefSeq genes and 96%) of CpG 
islands, to predict XCI for six unbalanced t(X;A)s by DNAm. 
We observed variable extents of silencing, ranging from 2 to 
42% of genes silenced, with silencing being more extensive 
for trisomic autosomal regions and regions closest to the trans- 
location breakpoint, although remarkably similar patterns of si- 
lencing were observed when the same chromosome was 
involved in a different translocation, which underscored the de- 
terministic nature of the underlying DNA sequences. 
Genome-scale data from the ENCODE project were used to 
assess both DNA sequence and epigenetic markers for differen- 
tial association between genes predicted to be subject to inacti- 
vation and to escape from inactivation in the context of 
t(X;A)s. We found that autosomal genes predicted to be 
subject to inactivation were depleted of Alus upstream of and 
around the transcription start site (TSS) in addition to within 
the gene body, but that Alus were enriched in the large 
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domains of multiple escape genes. Genes predicted to escape 
from inactivation were depleted for L 1 s within the gene body. 
Impressively, genes subject to inactivation were enriched with 
the heterochromatin marks EZH2, H2A.Z and H3K27me3 chro- 
matin marks in non-translocated normal autosomes and were 
depleted of RNA transcripts and open chromatin marks. Further- 
more, analysis of chromatin structure data revealed consistency 
of XCI status within the same topological domains observed on 
non-translocated autosomes. Studying the role DNA features 
play in the spread of inactivation is valuable to expanding our 
understanding not only of XCI but also of other forms of long- 
range gene regulation, which occur across the genome. 
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RESULTS 

DNAm spreads into autosomal sequences in 
unbalanced t(X;A)s 

We analysed DNAm using the lUumina Infinium HumanMethy- 
lation450 array on six t(X; A) that were available as fibroblast cell 
lines or DNA from fibroblast cell lines from the Coriell Institute 
for Medical Research (Camden, NJ, USA) (Supplementary 
Material, Fig. SI). As the karyotypes were unbalanced, some 
of the cell lines were disomic and others trisomic for the auto- 
somal region involved in the translocation, which would be 
expected to influence the amount of selective pressure favouring 
spread of XCI. The translocated portion of four (GM01414, 
GM00074, GM08134 and GM07503) out of six t(X;A)s were 
significantly different (P<0.01) from the control average 
DNAm levels, demonstrating that the spread of inactivation 
into the translocated portions of these trisomic t(X;A)s resulted 
in significant changes in DNAm. In the two remaining t(X;A)s 
(GM01730 and GM05396), the autosome was present in 
disomy and therefore not anticipated to show as much silencing 
as when in trisomy. Using the Illumina Infinium HumanMethy- 
lation27 array, we had previously shown that probes from both 
HC and IC density CpG islands (the term CpG island will now 
refer to both HCs and ICs) promoters showed hypermethylation 
in females (XaXi) relative to males (Xa), while only ~10% of 
non-island containing promoters showed female-specific hyper- 
methylation (26), and therefore we restricted subsequent ana- 
lyses to CpG island probes. To facilitate the identification of 
candidate sequences containing c/s-acting regulatory elements, 
we wished to analyze the data by gene rather than by individual 
probe. 



CpG island DNAm changes with distance from TSS 

To identify which autosomal genes had become inactivated, we 
needed to know which of the probes for a gene should be com- 
bined into a genie promoter average. The Illumina Infinium 
HumanMethylation450 array contains an average of eight 
probes per gene promoter region, even when probes overlapping 
polymorphisms or putative repetitive elements are eliminated 
(40). Therefore, to determine which probes would consistently 
detect a DNAm difference due to XCI, we used previously 
reported studies of genes that escape XCI to establish two train- 
ing sets of X-linked genes. The first, a 'subject training set', was 
comprised of 173 genes, which were previously found to be 
subject to XCI in all examined tissues (26) and were also 
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Figure 1. X-linked CpG island promoter DNAm is influenced by distance from 
tlie TSS. (A) Tlie DNAm of normal female fibroblasts were used to compare the 
DNAm for two training sets of genes; those X-linked genes known to be subj ect to 
XCI (dashed line) and those X-linked genes which escape from XCI (solid line). 
Bins of 100 bp were created surrounding the TSS for all X-linked genes in the 
training sets and the DNAm of all probes located within that bin averaged for 
each training set. Grey shading highlights probes which most accurately 
predict XCI. EiTor bars represent one standard error of the mean. (B) There is a 
significant (Mann- Whitney test, P > 0.0001) difference in the average genie 
DNAm (only using probes between 400 bp upstream of the TSS and 1300 bp 
downstream) between genes in the escape and subject training sets. Each dot 
represents a single gene from the training set, the thick black line is the 
average and the error bars, one standard deviation. The levels of DNAm required 
to called a gene as subject (Group 1 subjects) to XCI (middle graph) or as escaping 
from XCI are diagrammed (right graph). There were three levels of DNAm exam- 
ined, the control DNAm, the translocated DNAm and the translocated-control 
DNAm. For Group 2 subject genes, one of those may fall outside of the subject 
range but not within the escape range. 



found to be silenced in at least 78% of Xi hybrids (1). The 
second 'escape training set' was comprised of 32 genes, which 
were previously found to escape from XCI in all examined 
tissues (26) and had been demonstrated to escape from XCI in 
>78% of Xi hybrids (1). We then examined all probes around 
the CpG island promoters of the genes in these training sets 
and plotted the DNAm levels in females. As shown in 
Figure lA, the subject training set had a consistent DNAm 
level of over 30% surrounding the TSS; whereas the escape 
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training set, which was smaller and showed more fluctuation, 
exhibited lower DNAm levels between 400 bp upstream and 
1301 bp downstream of the TSS. Therefore, we used an 
average of the probes located from 400 bp upstream to 
1300 bp downstream of the TSS (shaded grey box in Fig. lA) 
to create a single DNAm value for each CpG island promoter. 
Such genie averages clearly distinguished the subject and 
escape training sets (Fig. IB), allowing us to establish criteria 
for calling an autosomal gene as subject to or escaping from in- 
activation. Any CpG island probe that showed > 20% DNAm on 
the non-translocated chromosome was excluded from further 
analysis, and then a genie inactivation status was predicted. 
Briefly, a gene was called subject to inactivation when the 
genie CpG island promoter average showed DNAm >25% 
when translocated, and a DNAm delta (translocated — control) 
between 22 and 60%. Genes were classified as escaping from in- 
activation when DNAm was <25% when translocated and the 
DNAm delta (translocated — control) was between —10 and 
20%. Additionally, an inactivation status was only predicted 
when at least two CpG island probes were between 400 bp up- 
stream and 1300 bp downstream of the TSS. This is a stringent 



criteria and might miss some genes, in particular those on a triso- 
mic t(X;A) where only one of the three copies of an autosomal 
gene subject to inactivation is expected to gain DNAm. There- 
fore, we created a second, less-restrictive category to identify 
another group of subject genes. In Group 2 subject genes, 
DNAm was required to meet only two of the three criteria 
(control DNAm, translocated DNA and DNAm delta) previous- 
lyusedto define Group 1 subject genes so long asDNAmwasnot 
within the escape range. Genes predicted by DNAm to be subject 
to XCI, both Groups 1 and2,willhenceforthbetermed'subjects' 
while genes predicted by DNAm to escape from XCI will be re- 
ferred to as 'escapes'. 

DNAm analysis predicts varied degrees of spread 
of inactivation between t(X;A)s 

We generated a DNAm heat map for each t(X;A), which com- 
pared the average genie DNAm levels per CpG island gene pro- 
moter in controls (all fibroblast lines in which the chromosome 
was not translocated) to the average genie DNAm levels on the 
translocated chromosome which was then used to assign an 
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Figure 2. Autosomal CpG island promoter DNAm suggests different degrees of spread of inactivation. For each autosome involved in an t(X;A), the nornial genie 
average DNAm is shown to the left and the average genie DNAm of the t(X; A) in the middle. The inactivation status for each gene is shown to the right with the number 
of genes subject to inactivation (Group I : red. Group 2: orange), escaping from inactivation (green) and uncallable (grey) given below each sample. Average genie 
DNAm levels are shown in the order found on the chromosome but the distance between genes is not to scale. On the far right of set each section is a scale with the 
distance from the breakpoint. DNAm is shown on a colour scale from 0% (yellow) to 50% (green) to 100% (blue). 
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inactivation status (Fig. 2). For each gene within the autosomal 
portion of the t(X;A), an average DNAm was calculated for all 
samples with a normal autosome and for the sample carrying 
the t(X;A). The false discovery rate for each autosome was cal- 
culated by dividing the false positives by the total number of 
genes (false positives plus true positives). False positives were 
autosomal subject genes despite not being on the autosome 
involved in the t(X;A). True positives were autosomal escape 
that were not on the autosome involved in the t(X;A). The 
average false discovery was 0.003 with a maximum false discov- 
ery rate of 0.014, which was on chromosome 21 (chr21) in 
GM01414. The autosomal portion of each t(X;A) was broken 
into quartiles and the percentage of genes subject to XCI deter- 
mined (Supplementary Material, Fig. S2). A test of each 
t(X; A) revealed that the distribution of genes subject to inactiva- 
tion was significantly different (_P < 0.05) than expected by 
chance with an over representation of subject genes in the quar- 
tile closest to the breakpoint. The four samples (GM07503, 
GM01414, GM00074 and GM08134) in which the autosomal 
portion of the t(X;A) translocation was trisomic showed a 
higher percentage of subj ects than the samples in which the auto- 
somal portion of the t(X;A) was disomic (GM01730 and 
GM05396); however, there was considerable variability in the 
extent of predicted silencing between the different t(X;A). 

GM05396 (t(X;22), Fig. 2F) showed the lowest number and 
percentage of subjects (2%), consistent with selection against 
inactivating the disomic chromosome 22, and the reported late 
replication of only the X chromosomal portion of the trans- 
location, although dysmorphic features were reported for the 
female (Coriell Institute for Medical Research). GM07503 
(t(X;2) Fig. 2A) also showed minimal silencing (5%), which 
was predominantly close to the translocation breakpoint. This 
limited silencing could be consistent with the mental retar- 
dation and dysmorphic features seen in the proband. Over 
24% silencing was observed for the other four translocations 
(Fig. 2B-E). GM01414 and GM00074 (t(X;9) and t(X;14), 
respectively) have been previously extensively examined 
(22,41,42). The individuals from which the samples were col- 
lected lack substantial clinical features of autosomal trisomy 
and the t(X; A)s are predominantly late replicating or hypoacety- 
lated, and show only partial spread of the XIST RNA along the 
chromosome (22). 

The remaining two cell lines (GM08134 and GM01730) both 
had t(X;2 1 )s, involving the majority of chr2 1 ; however, for the 
former, chr21 was essentially trisomic, while in the latter, 
chr21 was disomic. Surprisingly, there was quite a similar 
pattern of silencing along chr21, consistent with GM01830 
being reported to have late replication of both the X and auto- 
somal portion of the translocation in the majority of cells and 
phenotypic similarities to 21 deletion syndrome (43). Seventeen 
percent of genes classified as subjects in one t(X;21) were clas- 
sified as escapes in the other t(X;21). The average difference in 
DNAm between the discordant chr2 1 genes was 1 7% compared 
with 1% difference in DNAm in the 83% of concordant genes, 
supporting that the discordant genes truly had a different XCI 
status between the two different chr21 XAs. The genie DNAm 
average for CpG island promoters on the chr21 portions of the 
t(X;2 l)s with escapes and subjects, based on the criteria outlined 
in Figure IB, are shown in Figure 3. In addition to an overall 
similarity in the patterns of DNAm, the conservation of the 



sharp boundary between escapes and subjects (dashed line) sug- 
gested conservation of a putative boundary defining sequence 
within a ~300 kb region. By expanding our analysis beyond 
only CpG islands associated with TSSs to non-promoter CpG 
islands such as the uimiethylated, and therefore escaping, CpG 
island located between C21orf82 and KCNE2 (Fig. 3) we were 
able to refine the region in which the putative boundary defining 
sequence might be located to ~ 175 kb. 

DNA sequence features differ around subject 
and escape genes 

The similarities between the inactivation statuses in the two 
t(X:21)s suggested that DNA sequences and/or features of chro- 
matin structure might play a substantial role in the determination 
of inactivation status. We investigated the potential features 
associated with the spread of heterochromatin on the autosomal 
portion of the t(X;A)s by calculating a significance score and 
visualized DNA features as anchored coverage plots to 
compare subjects and escapes. For each feature, three genomic 
sections were examined: the promoter section ( + 5 kb around 
the TSS); the genie section (entire length of transcription) and 
the upstream section (the 15 kb region upstream of the TSS). 
As described in the 'Materials and methods' section, the enrich- 
ment significance score reflects the enrichment of the indicated 
feature within either subjects or escapes. Supplementary Mater- 
ial, Table S 1 includes the mean coverage of each feature and the 
percentage of the defined regions that contained each feature 
while Supplementary Material, Figure S3 includes the anchored 
density plots for all features. Table 1 lists the top 10 DNA se- 
quence features that differed between the promoters of subjects 
and escapes. Of the 67 DNA sequence features examined, Alus 
were most (1.44-fold) significantly depleted in the promoters 
of subjects compared with escapes and were also significantly 
depleted in the upstream section (1.44-fold) and gene body 
(1.36-fold) of subjects. Lis were significantly enriched in the 
genie section of subjects compared with escapes (Fig. 4A). 
While the DNA sequence features are static, chromatin proper- 
ties can vary between cells and we next examined high- 
throughput chromatin data to compare between subjects and 
escapes. 

Heterochromatic marks are enriched at subjects 
in non-translocated cells 

Given the detected differences in DNA sequence features, we 
were interested if differences in chromatin features could also 
be detected between subjects and escapes. One would anticipate 
changes in chromatin marks in the t(X;A) cells themselves; 
however, separating cause and effect would not be possible in 
our cell lines. Therefore, the question we examined was 
whether there were predisposing chromatin marks identifiable 
through the ENCODE data, which captures these properties in 
a variety of normal fibroblast cell lines, not the t(X;A) cells. 
As with the DNA sequence features, significance scores and 
anchored coverage plots were generated for chromatin features 
from the ENCODE project datasets (44,45). In the promoter 
and upstream genomic sections used above, EZH2 was the 
feature with the most significant difference between subjects 
and escapes while in the genie section, nucleus longPolyA, 
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Figure 3. t(X;21)s show conservation of subject and escape domains. (A) Tlie genie metliylation for all CpG island promoters on chr21 is shown as a line graph with 
each dot representing a single CpG island promoter. The dot signifies the predicted XCI status (subject: light grey squares, escape: dark grey circles, uncallable: white 
triangles) based on the level of methylation on the translocated chromosome, the average control inethylation and the delta between the two (criteria explained in 
Fig. IB). Above each line graph, the range and size of the subject (light grey) and escape (dark grey) domains is denoted by thick bars. The vertical hashed line 
marks the potential conserved boundary element located between SLC5A3/MRFS6 and FAM165B. (B) DNAm in the chr21 boundary region marked in (A). 
SLC5A3/MRPS6 is predicted to escape from inactivation while FAM165B is predicted to be subject to inactivation. The gene and CpG density associated with 
each probe is shown below. 



was the most significantly different (Table 2). EZH2, 
H3K27me3, H2A.Z and nuclear longPolyA, reflecting nuclear 
transcription levels, were found to be significantly different 
between subjects and escapes across all three examined 
genomic sections. As might be anticipated, the heterochromatic 
marks EZH2, H3K27me3 and H3K9me3 were enriched in sub- 
jects compared with escapes whereas nuclear longPolyA was 
depleted in subjects compared with escapes (Fig. 4B). While 
escapes showed expression levels similar to the genome 
average, subjects were expressed at a significantly lower level. 
Interestingly, the differences between transcription levels 
around subjects and escapes was strongest when transcripts 
from the nucleus compartment were used in comparison with 
cytosolic or whole cell fractions, potentially emphasizing a 
negative influence of nuclear transcripts both up- and 



downstream on the spread of heterochromatin. Some chromatin 
features, such as H3K9me3, were only significantly different at 
the genie section (Supplementary Material, Table Si). We did 
not, however, find significant difference in transcriptional asso- 
ciated markers, RNA polymerase II nor in H3K4me3 (Supple- 
mentary Material, Fig. S3), within any genomic sections 
between subject and escapes. 

Thus, we observed both DNA sequence and epigenomic dif- 
ferences correlated with the extent of silencing on the autosomal 
portion of t(X;A)s. The significant association of some chroma- 
tin features in normal somatic cells with the ability of a translo- 
cated gene to be silenced might reflect the known interplay 
between DNA sequence and chromatin features, which charac- 
terizes heterochromatic G-positive regions. Our analysis pre- 
dicted inactivation based on CpG island DNAm and over 66% 
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Table 1. DNA sequence features with the 10 most significant (/-values in the promoter genomic section 



DNA sequence features 


Promoter genomic section 


Genie genomic section 




Upstream genomic section 




(/-value 


Mean enrichment 


<jr-value 


Mean enrichment 


(/-value 


Mean enrichment 


SINE 


4.26E - 12 


S: 0.1495 


2.20E - 05 






4.26E - 12 


c- 


U. 1 1 






E: 0.2129 




E: 0.1924 






E 


0.2516 


Alu 


1.23E- 11 


S: 0.1164 


0.0002736 


c. n 1 1 ^7 

L>. U. i 1 J / 




1.23E - 11 


Q. 

lJ. 


ft 1 ^"^0 






E: 0.1758 




E: 0.1571 






E 


0.2196 


IC 


8.65E- 06 


S: 0.2820 


4.26E - 12 






3.20E - 13 


c. 

ij. 


U. lOoo 






E: 0.3146 




E: 0.2775 






E 


0.2302 


HC 


0.0001908 


S: 0.1661 


0.207 


S: 0.1091 




0.3131 


S: 


0.0653 






E: 0.1393 




E: 0.1135 






E 


0.0647 


RepeatAll 


fi finn77/^7 

U. UUUz / 0/ 


c. n 'AdAK 

J. U.jM-'H-o 




S: 0.3789 




U.ZU 1 7 


S: 


0.4973 






E: 0.3883 




E: 0.3683 






E 


0.5033 


Satellite 


0.0009831 


S: 7.00E - 04 


0.09876 


S: 0.0000 




0.1729 


S: 


l.OOE - 04 






E: 0.0000 




E: 0.0000 






E 


l.OOE - 04 


CpGs 


0.001308 


S: 0.1209 


0.2887 


S: 0.0889 




0.2212 


S: 


0.0459 






E: 0.1018 




E: 0.0907 






E 


0.0476 


LTR 


0.008435 


S: 0.0575 


2.55E - 08 


S: 0.0483 




7.17E - 05 


S: 


0.1038 






E: 0.0431 




E: 0.0301 






E 


0.0758 


hAT-BIackjack 


0.01463 


S: l.OOE - 04 


0.0001908 


S: 6.00E - 


04 


0.1995 


S; 


5.00E - 04 






E: 7.00E - 04 




E: 6.00E - 


04 




E 


7.00E - 04 


L2 


0.0361 


S: 0.0337 


0.004727 


S: 0.0337 




0.09344 


S: 


0.0402 






E: 0.0385 




E: 0.0311 






E 


0.0373 



Italicized (/-values are statistically significant (q < 0.01) and mean enrichment is given for subject (S) and escape (E) genes and the emiched category highlighted in 
boldface type. All examined features are listed in Supplementary Material, Table S 1 . 
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Figure 4. Average density plots of DNA sequence and epigenetic features around subjects and escapes after G-banding separation. Anchored density plots around the 
TSSs for DNA sequence features (A) and chromatin features (B and C). The Ar-axis coiTesponds to the distance in base pairs upstream (negative) or downstream (posi- 
tive) of the TSS, whereas the v-axis shows the average density of each feature at the location relative to TSSs within the group (subjects; red line and escapes: green 
line). The dashed grey line in the plots represents 5000 randomly selected autosome genes with intermediate or high CpG density regions at the TSSs. All lines are 
plotted using the locally weighted scatteiplot smoothing approach. The cell type in which chromatin features were examined is given in brackets after that feature title. 
Subjects and escapes were compared for three genomic sections: the promoter (P) section ( + 5 kb around the TSS); the genie (G) section (entire length of transcription) 
and the upstream (U) section (the 1 5 kb region upstream of the TSS). Statistical significance comparing subjects and escapes is shown for each feature in which a test 
was performed. (C) Density plots further segregated into G-band negative (orange/light green) and G-band positive (dark red/dark green). 



of genes analysed were located within G-negative regions; 
however, only 46% of subject genes were in G-negative bands, 
so we further subdivided our examination of chromatin features 
by G-band. Some features showed little difference between 
G-positive and G-negative bands (e.g. EZH2 and nuclear long- 
PolyA) while in others (e.g. H3K9me3), separation based on 
G-banding could explain much of the observed differences 
in subjects and escapes (Fig. 4C; Supplementary Material, 



Fig. S4). For H2A.Z G-banding constituted much, but not all 
of the differential feature densities. Thus, there is enrichment 
of subjects in G-positive and H3K9me3-enriched chromatin, 
but an additional predisposition to silencing for both G-positive 
and G-negative regions that are enriched in the ability to bind 
EZH2 and recruit H3K27me3. The conclusion that the presence 
of heterochromatin is directly related to the physical compart- 
ments into which DNA is location within the nucleus led us to 
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Table 2. Chromatin features with the 10 most significant (/-values in the promoter genomic section 



Chromatin features Promoter genomic section Genie genomic section Upstream genomic section 



(cell type) 


(/-value 


Mean enrichment 


(/-value 


Mean emichment 


^-value 


Mean enrichment 


EZH2_(39875) 


4. 07 E- 08 


S: 


0.5261 


7.64E-09 


S: 


0.3719 


4.88E-07 


S: 


0.4403 


(NHDF-Ad) 






E' 0 3433 






E' 0 261 






F- n 


nucleusJongPolyA 


4.55E-06 


S: 


0.3743 


1.54E-06 


S: 


0 4796 


2.72E-07 


S: 


0.2125 


(IMR90) 




















H2A.Z 


8.65E-06 


S: 


0.5946 


0.002435 


S: 


0.4477 


0.0001149 


S: 


0.4728 


(NHDF-Ad) 


















F- n 3787 


CeliJongPolyA 


0.0001 J 88 


S: 


0.2219 


5.78E-05 


S: 


0.3316 


2.54E-05 


S: 


0.0819 


(IMR90) 






E: 0.2929 






E: 0.439 






E: 0.1506 




n nnn ij'^A 

U. UUUl ^JO 


S: 


0.4523 


U.UUuJhUj 


S: 


0.3077 




S: 


0.4901 


(NHDF-Ad) 






E: 0.3312 






E: 0.2372 






E: 0.3575 


Cytosol_longPolyA 


0.03067 


S: 


0.1192 


0.001652 


S: 


0.1415 


0.0001698 


S: 


0.0585 


(IMR90) 






E: 0.1445 






E: 0.1918 






E: 0.0934 


CTCF_(SC-15914) 


0.03814 


S: 


0.0385 


0.06623 


S: 


0.0226 


0.1975 


S: 


0.0244 


(IMR90) 






E: 0.0321 






E: 0.0261 






E: 0.0211 


H3K9ac 


0.04083 


S: 


0.5433 


0.07869 


S: 


0.4682 


0.02219 


S: 


0.4075 


(NHDF-Ad) 






E: 0.4819 






E: 0.4145 






E: 0.3061 


COREST_(sc-30189) 


0.05147 


S: 


0.0215 


0.2408 


S: 


0.0164 


0.1729 


S: 


0.0131 


(1MR90) 






E: 0.0212 






E: 0.0172 






E: 0.0154 


CelLtotal 


0.0647 


S: 


0.3924 


0.09316 


S: 


0.517 


0.0008847 


S: 


0.1979 


(IMR90) 






E: 0.4256 






E: 0.5611 






E: 0.2642 



Italicized (/-values are statistically significant (q < 0.01) and mean enrichment is given for subject (S) and escape (E) genes and the em'iched category highlighted in 
boldface type. All examined features are listed in Supplementary Material, Table S 1 . 



observe high consistency of either subject or escape groups 
within each domain exhibiting low entropy. We investigated 
195 topological domains defined in IMR90 cells that contained 
more than one subject or escape gene. A stringent entropy 
measure of 0 was observed for 67% (n = 130) of domains, indi- 
cating a strong tendency for subj ect and escape to segregate {P = 
1.9031 X 10~'^). Similar segregation was observed for domains 
in human ES cells (P = 2.6264 x 10~'^). These findings high- 
light the role that higher structure may play in determining the 
spread of inactivation along the chromosome 



DISCUSSION 

The considerable extent of DNAm of CpG island promoters 
observed on the t(X;A)s suggests that DNAm profiling can be 
used as a means to detect the spread of silencing from the X 
chromosome to the autosome; thereby providing a means to iden- 
tify candidate DNA elements involved in the spread of inactiva- 
tion or escape from inactivation. The CpG island promoters of 
autosomal genes typically show extremely low DNAm (46); 
therefore, an increase in DNAm at the CpG island promoter of 
an autosomal gene in a t(X;A) suggests that the gene has 
become silenced due to the spread of inactivation. Previous ana- 
lysis of t(X;A)s has demonstrated that the DNAm status of auto- 
somal genes shows good agreement with inactivation status 
(21,23). By using the Illumina Inflnium HumanMethylation450 
platform, we were able to use DNAm to predict the spread of in- 
activation across the autosomal portion of six t(X;A)s. Overall 
our DNAm-based assignment of inactivation status resulted in a 
lower frequency of XCI than seen with previous assessments of 
individual genes. There could be several reasons for this discrep- 
ancy. Importantly, while the gain of DNAm gives us confidence 
that a gene has been influenced by the spread of silencing, a 
gain of DNAm may not always be present or retained when 



investigate the features of larger domains of subjects and 
escapes, which might be physically separated in the nucleus (39). 

To investigate the impact of DNA sequence within regions of 
DNA larger than just that surrounding the TSS, we grouped sub- 
jects and escapes together to form domains that were defined as 
contiguous regions of subjects or escapes, as shown above the 
plots for chr21 in Figure 3. Domains required more than one 
genie call, and were not disrupted by uncallable genie regions, 
but were ended by a single discordant genie call. There was a 
large range of both escape and subject domain sizes, but there 
was no statistical difference in average size (escape: 2.6 Mb, 
subject: 1.0Mb,/'= 0.0510). Five types of DNA sequence fea- 
tures were examined: L 1 , Alu, long terminal repeats (LTRs), low 
complexity and simple repeats. Although none of these features 
showed a significant difference between subject and escape 
domains, the largest difference was again observed at Alu ele- 
ments, which were enriched in domains that contained multiple 
escapes (14.74%) compared with domains that contained mul- 
tiple subjects (1 1.26%) (Supplementary Material, Fig. S5). Dif- 
ferences in the repetitive element content of large-scale domains 
suggest that long-range regulatory processes between these 
domains may be involved in the spread of inactivation. 

Genes segregate into topological domains based on 
inactivation status 

To study whether the spread of heterochromatin on t(X;A) was 
influenced by higher order chromatin organization, we assessed 
the consistency of X-inactivation status within topological 
domains. These domains are megabase-sized local chromatin 
interaction zones obtained from Hi-C analysis, of the IMR90 
and human embryonic stem (ES) cell lines (39). By computing 
the entropy of subject/escape groups within each domain, we 
hypothesized that if there existed an influence of the topological 
domains, from a non-translocated context, on XCI, we would 
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genes are silenced. Secondly, there might be a bias in genes that 
were chosen for assessment in previous studies, although they 
were not particularly enriched at the breakpoints where we did 
observe greater XCI (reviewed in 2). It is also possible that 
genes lacking CpG island promoters, which we did not address 
with DNAm, may be more prone to silencing. However, at least 
some of the difference is likely attributable to the stringency at 
which we set our thresholds for DNAm. Heterogeneity and 
partial expression have previously been reported for silencing in 
t(X;A) and this would reduce the level of DNAm (47). If some 
autosomal genes are subject to inactivation in only a subset of 
cells, then the average DNAm might not be high enough to be pre- 
dicted as subject to inactivation. Genes with average DNAm in the 
uncallable range could be variably inactivated genes that are only 
inactivated in a subset of cells. Samples where the autosomal 
portion of the t(X;A) was trisomic would also be expected to 
have lower DNAm than when the autosomal portion was 
disomic, since only one of the three autosome would gain 
DNAm. We have included a less stringent group of subject 
genes, to allow for genes with slightly lower DNAm to be classi- 
fied as subject; however, it is possible that some escape genes may 
be subjected to inactivation in a small proportion of cells. 

There were two main classes of features, DNA sequence and 
chromatin, that we wished to compare with inactivation status. 
Studies examining the DNA sequence of regions on the X chromo- 
some which are subject to XCI compared with regions which 
escape from XCI are confounded by the evolutionary pressures 
which the X chromosome has undergone. Examination of the 
DNA sequence for domains subject to inactivation compared 
with domains that escape from inactivation on the autosomal 
portion of t(X;A)s disentangles the complex evolutionarily 
history of the X chromosome and thus provides a complementary 
system in which to study the role that sequence composition plays 
in determining XCI status. Chromatin features, such as those asso- 
ciated with silent heterochromatin, may further influence the for- 
mation of larger nuclear compartments and/or interactions 
between different regions of DNA (39). We observed differential 
enrichment of repetitive elements between subj ects and escapes at 
both the domain and individual gene level. Differences in DNA 
sequence are thought to allow domains which escape from XCI 
to loop out of the Xi domain (48) while domains rich in repetitive 
elements come together to fonn the dense heterochromatic core of 
the Xi which can be visualized as a ' CoT 1 hole ' (49) . Thus regions 
with high Long Interspersed Element (LINE) frequency may 
form the Barr body while the regions with low LINE frequency 
would be capable of looping outside of the silent Xi domain 
(48). The similarities in the patterns of inactivation between the 
two t(X;A) involving chr21 (GM01730 and GM08134) strongly 
support the role of DNA sequence in determining inactivation 
status. In particular, a transition between escape and subject 
domains was observed to lie in the same ~175 kb region and 
this region contained ENCODE defined insulator elements that 
are known to play a role in genome organization (reviewed in 
35). L 1 enrichment was seen at subjects but the striking depletion 
of Alus at subjects was even more dramatic and Alus were also 
seen to be over-represented in escapes at the domain levels. The 
relationship between DNA sequence and chromatin is complex 
and determining the exact interplay between the two will 
require further studies. We observed some chromatin marks 
(notably H3K9me3) that reflected the propensity of G-positive 



regions to undergo silencing; however, other marks, such as 
EZH2, were not reflective of G-banding consistent with other 
studies that have shown a physical separation of H3K9me3 and 
H3K27me3 dense region on the X chromosome (50) and thus 
enrichments observed may be aggregates of significant chromatin 
neighbourhoods. 

Polycomb protein recruitment and histone modifications such as 
the accumulation of H3K27me3 and the decrease in H3K4me3 
have been associated with XCI in two large-scale allelic analyses 
of ChlP-chip or ChlP-seq experiments in mouse (51,52). These 
studies suggest that XCI spreads initially to a small number of 
Polycomb stations and from there to more frequent but weaker 
locations along the chromosome to complete the spread of XCI. 
In this study, we observed that genes subject to inactivation were 
enriched for EZH2, a Polycomb group protein, and H3K27me3 
in cells not containing a t(X;A). The enrichment of EZH2 and 
H3K27me3 supports a model in which Polycomb recruitment 
sites predispose DNA to allow the spread of heterochromatin. 
Our findings of overrepresentation of RNA transcripts and 
DNase I hypersensitive sites at escape TSSs suggest that genes ori- 
ginally highly transcribed or in open chromatin domains are more 
inclined to escape from heterochromatin spreading. The observa- 
tions suggest a potential relationship between autosomal properties 
profiled by ENCODE in normal cells and the observed subject/ 
escape patterning in the translocated chromosomes. 

Exactly how chromatin features influence the 3D structure of 
the chromosome within the nucleus is unknown; however, tran- 
scriptional silencing likely occurs through various pathways. 
The Xi has previously been shown to have features of heterochro- 
matin that appear to form topically distinct domains (50,53). 
Topological domains, as defined by Hi-C, have been found to 
be stable across cell types and during development in human 
and mouse cells, and a model has been proposed that regions 
within domains can be dynamic to take part in cell-type-specific 
regulatory events (39,54). Interestingly, we found that the poten- 
tial for heterochromatin spread tended to be consistent within 
topological domains of non-translocated autosomes, observed 
as statistically significant segregation of inactivation status. This 
indicates a fiarther influence of topological domains on XCI in 
addition to the sequence and epigenetic properties. 

The spread of XCI in t(X; A)s provides a means to identify ele- 
ments involved in the spread of heterochromatin by removing 
the complex evolutionary history of the X chromosome, and 
we report the use of DNAm to assess XCI in six t(X;A). 
However, there are multiple additional considerations regarding 
why a particular region of genes might be subject to or escape 
from XCI. A major confounding factor which will influence 
the degree to which inactivation is observed to have spread on 
the autosomal portion of a t(X;A) is secondary selection. In 
order to maintain the most normal expression pattern, when 
the autosomal portion of the t(X;A) is disomic there may be se- 
lection against cells in which extensive silencing occurs. Con- 
versely, when the autosomal portion of the X;autosome 
translocation is trisomic, selection may work against cells in 
which minimal silencing occurs. Indeed, since most autosomal 
trisomies are not viable, the ability of t(X;A)s to exist with triso- 
mic autosomal portions speaks to the ability of inactivation to 
achieve a more normal expression pattern and minimize the 
negative phenotype (24), and we observed greater silencing in 
t(X;A) which were trisomic. However, spread of inactivation 
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may have been more extensive at the time of silencing and reacti- 
vation may have occurred subsequently; and indeed selection 
continues as cell lines are cultured. We restricted our analysis 
to fibroblast lines after preliminary analysis of a lymphoblast 
line (GM 10006) showed poor correlation of DNAm patterns 
on the X chromosomal portion with the Xi in female fibroblasts, 
beyond that attributable to normal tissue-specific differences in 
DNAm and escape from XCI (e.g. (26,55)). In a previous ana- 
lysis of GM01414 and GM00074, the XIST RNA was localized 
to only ~50% of the autosomal chromatin despite hypoacetyla- 
tion or late- replication of a larger portion of the autosome, 
leading to the suggestion that at an earlier time there may have 
been more extensive spread of XIST RNA (22); and there may 
also have been more extensive DNAm prior to culture of these 
cells. 

Another consideration is the distance over which silencing 
would need to spread, both from the X inactivation centre on 
the X chromosome, and across autosomal material. Previous 
analysis of a proband with a t(X;A) with a high frequency of 
LINEs at the breakpoint and a minimal phenotype led to the sug- 
gestion that the repetitive element content sun^ounding the 
breakpoint influenced the degree to which inactivation spreads 
(24). We did not observe a clear relationship between the 
spread of inactivation and the repetitive element content at the 
breakpoints we examined; however, GM05396, which showed 
the lowest degree of genes subject to inactivation, did have the 
lowest LI and the highest Alu content of all the t(X;A)s 
studied. We observed significantly more silencing close to the 
X chromosome (Supplementary Material, Fig. S2), raising the 
question of whether the silencing observed is a compendium 
of both XCI and position-effect variegation. GM07503 showed 
limited silencing which extended nearly 55 Mb from the break- 
point. However, the silencing was enriched close to the break- 
point which may reflect position-effect variegation. Overall, 
no general correlation between the distance of the breakpoint 
from the XIST locus and the ability to silence was observed, 
and clearly inactivation is able to spread a considerable distance 
into an autosome; with silencing in GM01414 spreading over 
140 Mb from theXZSTlocus to the farthest autosomal gene pre- 
dicted to be subject to XCI. Additionally, the ability of XCI to 
spread across a centromere may be more limited than the 
ability to spread through euchromatic sequences. In the mouse, 
Xist RNA does not bind to the centromere (56); and the majority 
of human genes which escape from XCI are located on the other 
side of the centromere from the XIST locus (57) leading to the 
suggestion that the centromere may act as barrier to the spread 
of inactivation on the X chromosome. Therefore, it is interesting 
to note that GM05396, which showed the lowest degree of inacti- 
vation, is a di-centric chromosome. 

Through the study of DNAm in t(X;A) we identified 332 auto- 
somal genes that were subject to the spread of XCI, and another 
1219 that potentially escaped silencing. In addition to the impact 
of selection and distance from the breakpoint, we observed a sub- 
stantial effect of DNA sequence, in particular repetitive ele- 
ments, on the ability of genes to be impacted by XCI. As 
previously reported, LI elements were enriched around genes 
subject to inactivation, but intriguingly we also observed that 
other elements, notably Alus, were found to be enriched at 
genes that escape from inactivation. Enrichments of DNA 
sequences were observed across larger domains of genes, and 



assessment of topologically associated domains showed highly 
significant clustering of escapes and subjects within such 
domains. In support of the spread of XCI through pre-existing 
chromatin structures, we observed the enrichment of heterocho- 
matic marks and proteins in normal cells for genes that were 
subject to silencing in the translocations, suggesting that these 
regions may be predisposed to allow the spread of inactivation. 
The substantial enrichment of H3K27me3 and Polycomb 
group proteins was independent of underlying G-banding rela- 
tionships, supporting the recent suggestion that spread of XCI 
may act through Polycomb stations. Overall, identifying the 
DNA features enabling spread or escape from inactivation will 
be an important contribution to understanding long-range gene 
regulation. 

MATERIALS AND METHODS 

Sample preparation and bisulfite conversion 

DNA was extracted using a standard extraction protocol with a 
Qiagen RNA/DNA Allprep kit. 750 ng of DNA was bisulfite 
converted using the EZ DNA methylation kit (Zymo Research) 
with the alternative incubation conditions (replace steps 4 and 5 
of standard incubation with [95°C (30 s), 50°C (60 min)] for 16 
cycles) outlined for use with the Illumina Infinium Human- 
Methylation450 array. DNA was obtained from GM07503; 
GM01414; GM00074; GM01730; GM08134; GM05396 and 
GM08399 (46,XX) (58). Karyograms for each t(X;A) were 
created using the Karyogram drawing tool at http://www.cyda 
s.org/index.html. Only the autosome involved in the t(X;A) as 
well as the X and Y chromosomes were shown. Karyotypes 
and clinical data as reported by Coriell Institute for Medical 
Research are as follows: GM07503^6,X,der(X)t(X;2)(q27;p 1 6)- 
mat.arr Xq27.3q28(143 412 907-154 887 040)xl,2p25.3pl6.1 
(2771-55 375 877)x3 — mental retardation and dysmorphic 
features. GM01414 46,X,-X,-Fder(9)t(X;9)(ql l.l;q32)mat.arr 
Xp22.33ql 1.1(108 464-62 265 547)xl,4q31.22(145 136 590- 
145 239 234)xl,8p23.1(7237 777-7 825 360)x3,9p24.3q32(1905- 
1 15 498 428)x3— Turner syndrome. GM00074— 47,Y,t(X;14) 
(ql3.2;q32.2),-hder(14)t(X;14)(ql3.2;q32.2)mat.arr Xql3.2q28 
(72 438 336-154 887 040)x2,14ql l.lq32.2(18 072 1 1 1-96 533 541) 
x3— Klinefelter syndrome. GM01730— 46,XX,der(21)(21qter 
>21pll::Xqll>Xqter)mat — mental retardation and multiple 
anomalies with phenotypic similarities to 2 1 -deletion syndrome 
(42). GM08134-46,X,der(X)t(X;21)(q22.3;ql 1.2)mat.arrXq22. 
3q28(109 833 598-154 887 040)xl,21qll.2q22.3(13 286 389- 
46 921 373)x3 — hypotonia and dysmorphic features including 
epicanthal folds. GM05396— 45,X,der(22)t(X;22)(Xqter> 
Xpl 1 ::22pl2>22qter) — hypotonia and pixie-like appeai^ance. 

CpG density definitions 

The programme CpGIE (59) was used to locate three CpG 
density classifications (HC, IC and low CpG (LC)) based on 
those used by Weber et al. (24) on chr2, chr9, chrl4, chr21, 
chr22 and chrX using the hgl8 genome build. The criteria for 
each CpG density were as outlined below. HCs, GC content 
>55%, an Observedcpo/ExpectedcpG >0.75 and at least 
500 bp (base pairs) in length. ICs, GC content >50%, an 
Observedcpo/ExpectedcpG >0.48 and at least 200 bp in 
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length. LCs were all those regions which were not HC or IC. Both 
HC and IC were considered to be CpG islands and both are there- 
fore included in any 'CpG island' category discussed. The 
'non-CpG island' category is composed only of LCs. 

Illumina Infinium HumanMethylation450 array 

All samples were run on the Illumina Infinium HumanMethyla- 
tion450 array. Briefly, 160 ng of bisulfite converted DNA was 
whole-genome amplified, fragmented and hybridized to the Illu- 
mina Infinium HumanMethylation450 array following a stand- 
ard protocol as outlined in the user guide. Arrays were scanned 
on the Illumina iScan system and imported into GenomeStudio 
for further analysis (2010.2). Results were subjected to a back- 
ground normalization using BeadStudio (versions 3.1.3.0 Illu- 
mina, Inc.) and probes with /"-values >0.05 or no beta values 
were removed. Quantile normalization was performed in R 
2.11.0 using the limma package (60). Before any DNAm ana- 
lysis was performed, four categories of probes were removed 
from the Illumina Infinium HumanMethylation 450 array. 
First, all 'ch.' probes were removed as these probes represent 
non-CpG DNAm and were therefore not of interest in this 
paper. Secondly, all chr2, chr9, chrl4, chr21, chr22 and chrX 
genes were compared against a list of cancer-testis family 
genes found at the CTdatabase (61) and all cancer-testis family 
genes removed from further analysis as they are typically hyper- 
methylated regardless of CpG density (62). All probes in which 
the target CpG overlapped a repetitive element, contained an 
single-nucleotide polymorphism or cross hybridized to another 
chromosome were removed from further analysis (40). 

External data sets 

Sequence features as well as processed high-throughput sequen- 
cing datasets generated by the ENCODE project were down- 
loaded from the University of California, Santa Cruz Genome 
Browser (using hgl9 build) (44,45). Data sets were obtained 
for repetitive sequences, G-banding, CpG islands, conserved 
regions and large-scale experimental data. The term 'G-band 
negative' refers to those bands classified as 'gneg' in the file an- 
notation while 'G-band positive' refers to 'gpos75' and 
'gposlOO' bands. The large-scale data include selected 
ChlP-Seq, DNase I hypersensitivity profiles, RNA-Seq expres- 
sion profiles; each class was restricted to data derived from 
normal fibroblast cells, IMR90 and Normal Human Dermal 
Fibroblast-Ad cells. The individual data sets are presented in 
Supplementary Material, Table SI. 

Definition and annotation of subject and escape genes 

All analyses were conducted using custom scripts in R (version 
2.15.2) and Bioconductor packages (version 2.12) unless other- 
wise stated. To assign CpG segments with subject or escape 
status to genes, we first obtained TSS information in hgl9 built 
from the EnsEMBL Genes 69 database through biomaRt (63), 
and assigned the CpG segments to the genes of the most proximal 
TSSs irrespective of the strands. Of note, 230 and 1 126 CpG seg- 
ments categorized as subjects and escapes were assigned to the 
nearest TSSs of genes, and were further narrowed down into 
212 and 993 unique TSSs, respectively. 



Assessing tlie significant association of features 

The test of significance was determined with a Wilcoxon 
rank-sum test of fragment densities between subject and 
escape genes for each feature. Feature fragment density for 
each TSS is computed as the number of nucleotides with 
feature over the total number of nucleotides tested. The tests 
were performed on three genomic sections: 15 kb regions up- 
stream of the annotated proximal TSS (spanning — 15 kb to 
— 1 ) and on 1 0 kb regions corresponding to segments spanning 
+ 5 kb from the TSS, and on the genie regions from + 1 to the 
end of the terminal exon. P-values of tests on features at three 
genomic sections were adjusted altogether for multiple hypoth- 
esis testings using the ^/-values package (64). A composite score 
was calculated across the three segments as the sum of — 1 x 
log 1 0 (^-values) for each feature. Features with a low percentage 
of the defined regions containing such feature (<10) in both 
subject and escapes at all three sections were excluded. 

Analysis of topological domains 

Topological domains from IMR90 and human ES (HI) cells in 
hgl8 build were downloaded from the Hi-C project website of 
the Ren lab (39). The liftOver tool was used to convert 
domains to hg 1 9 build and only complete domains were retained 
for subsequent analysis (65). We first assigned TSSs of subject 
and escape genes to the domains in which they are located, and 
focused on topological domains with more than one subject or 
escape gene. Within each domain, an entropy value was com- 
puted from counts of subject and escape genes. We then tested 
the association by comparing the proportion of domains with 
entropy = 0 to that of 10 000 randomized subject and escape 
assignments within each domain with respect to the overall 
subj ect and escape percentages. The overall P- value is estimated 
to be the probability of the true proportion from the null normal 
distribution estimated from randomization. 

SUPPLEMENTARY MATERIAL 

Supplementary Material is available at HMG online. 
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